fix(ci): bump AVM check-circuit per-tx timeout from 30s to 90s#23330
Draft
AztecBot wants to merge 1 commit into
Draft
fix(ci): bump AVM check-circuit per-tx timeout from 30s to 90s#23330AztecBot wants to merge 1 commit into
AztecBot wants to merge 1 commit into
Conversation
Collaborator
Author
Flakey Tests🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The nightly
AVM Circuit Inputs Collection and Checkworkflow run 25952408817 failed in theavm-check-circuitjob withProcess completed with exit code 124— the standardtimeout(1)SIGTERM exit. No constraint failures; one of thebb-avm avm_check_circuitinvocations just took longer than the 30s budget.Root cause
avm_check_circuit_cmdsinyarn-project/end-to-end/bootstrap.shemits oneparallelizecommand per dumped tx withTIMEOUT=30s. Theavm_cc_*tests have no owners entry in.test_patterns.yml, so a single timeout hard-fails the job — no retry.Reproduced locally against the same commit (downloaded the public S3 artifact tarball, built
bb-avm, ran every input):e2e_storage_proofheaviest tx: 10s.e2e_multiple_blobs/...0x0b6dc9867b...bin(theemit_full_size_public_logbatch tx): 23s wall on a 186-core c7a, 21.5s wall (43.6s CPU) withtaskset -c 0-3to simulate the 4-vCPUubuntu-latestrunner.That single tx was always sitting right against the 30s line. It finally tipped over on a slow runner.
Fix
Raise
TIMEOUTto90s, ~4× the current slow tx's wall time on a 4-vCPU runner. That absorbs runner variance and modest future AVM trace growth while still killing a genuinely hung run quickly.Full analysis (numbers, per-stage tracegen breakdown, repro steps): https://gist.github.com/AztecBot/a47798ee8721db19667e9b8844f86b3a
ClaudeBox log: https://claudebox.work/s/27ac153ca9b8a375?run=1